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1 Introduction 


In population and evolutionary biology, hypotheses about micro-evolutionary processes (e.g. population ge¬ 
netics) and macro-evolutionary processes (e.g. speciation and extinction) are commonly tested by comparing 


frequencies of the shapes of empirical evolutionary trees with those predicted by null models (see, e.g. Mooers 


and Heard 1997 Nordborg, 2001 Blum and Frangois, 2006 Purvis et al 2011 Hagen et al, 20151. One chal¬ 


lenge in this approach is the ability to compute the distributions of various tree shape indices under the models 
of interest, which is needed in statistical testing for calculating the p-value of the empirical shape statistics or 
constructing a confidential interval. Even for some relatively simple null models, this can still be a challenging 
task. Many current approaches are based on approximating techniques, such as Monte Carlo sampling (see. 


e.g. Blum and Frangois 20061 or Gaussian approximation (see, e.g. McKenzie and Steel 20001, which could 


be computationally intensive or restricting the tests to the trees above a certain size. Therefore it is desirable 
to explore efficient ways of computing these distributions exactly. 

Two widely used null models for generating random trees in population and evolutionary biology are the 


Yule-Harding-Kingman (YHK) model (Harding, 1971 Yule 1925 Kingman 19821 and the proportional to 


different arrangements (PDA) model (Aldous 2001). Under the PDA model all rooted binary trees of the same 


size are chosen with the same probability (Aldous, 20011 whilst under the YHK model each tree is chosen with 


a probability proportional to the number of total orderings that can be assigned to its internal nodes so that 
the relative partial ordering derived from the tree topology is preserved. 

In this paper, we are interested in the exact computation of the joint distribution for the number of subtrees 


under the YHK and PDA model. Here a subtree, also known as a fringe subtree in Aldous (1991), consists of 


a node and all its descendants. More specifically, we study the distributions of the number of cherries, subtrees 
with two leaves, and that of pitchforks, subtrees with three leaves. Note that this is equivalent to study the 


joint distributions of 2-pronged and 3-pronged nodes as defined in (Rosenberg 2006), as well as the joint 


distributions of clades of size two and three as defined in (Zhu et al, 2011). 


We now describe the contents of the rest of this paper. In the next section we gather some necessary 
notation and background. In particular, we present a random tree generating process for realising both the 


YHK and PDA models as described in McKenzie and Steel (2000). In contrast to the splitting model that 


were used in several previous studies concerning the asymptotical distributions of subtrees (see, e.g. Chang 


and Fuchs 2010), the process used here is based on iteratively attaching leaves. We therefore also collect 


some observations on the change of the numbers of cherries and pitchforks in a tree when an additional leaf is 
attached. 

In Sections and we study subtree distributions under the YHK and the PDA models, respectively. 
Our main results include two novel recursive formulae on the joint distributions of cherries and pitchforks; 
see Theorem [T] for the one under the YHK model and Theorem |4] for the one under the PDA model. These 
recursions enable us to develop a dynamic approach to numerically compute the joint distributions, and hence 
also their marginal distributions, for trees of any size. As an example, in Fig. we illustrate the distributions 
for trees with 200 leaves, which suggests the joint distributions under both models can be well-approximated 
by bivariate normal distributions. 
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The YHK model 



The PDA model 



Fig. 1: Probability density function for the joint distribution of cherries and pitchforks on phylogenetic trees 
with 200 leaves. 

Rewritten in functional forms, the recursions also provide a way to compute the covariance and correlation 
of the joint distributions under these two models. Somewhat surprisingly, we find that under the YHK model 
the correlation between the cherry and the pitchfork distributions is a constant — ^14/69, which is independent 
of the number of leaves (see, e.g.. Corollary |^. In addition, the recursions also lead to an alternative, and 
arguably more unified approach to computing the moments of these two distributions, and we demonstrate 
this by reaffirming several results obtained in previous studies. 

Using the recursions on cherry distribution derived from the joint distribution, we obtain in Theoremj^the 
exact formula for cherry distribution under the PDA model, and derive some interesting properties concerning 
cherry distribution in general, including that this distribution is log-concave and hence unimodal under both 
models (see Theorems and [^ . 

In Section we present a comparative study of cherry and pitchfork distributions under the YHK and 
PDA models. We first compare the mean and the variance of these two distributions under these two models. 
Then we show in Theorem|^that there exists a unique change point when comparing cherry distribution, that 
is, there exists a critical value for each n > 4 such that the probability that a random tree with n leaves 
generated under the YHK model contains k cherries is lower than that under the PDA model if 1 < fc < r„, 
and higher if r„ < fc < n/2. Finally, we conclude in Sectionwith discussions and some open problems. 
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2 Preliminaries 


For later use, we present in this section some basic notation and results concerning phylogenetic trees. Through¬ 
out this article, X denotes a finite set with \X\ = n > 2. 


Phylogenetic trees A phylogenetic tree T = (V{T), E(T)) on A is a rooted tree with leaf set L{T) = X 
such that the root has one child whilst all other vertices have either zero or two children (see Fig. for an 
example). Note that in this paper phylogenetic trees are rooted, with their edges directed away from the root. 
In addition, for technical simplicity we assume without loss of generality that the root has one child (also 


referred to as planted phylogenetic trees by Baroni et al (2005)). Let E*{T) be the set of pendant edges in T, 
i.e., those edges incident with a leaf. Then we have \E{T)\ = 2n — 1 and \E*{T)\ = n. 


r r 




Fig. 2: Examples of phylogenetic trees. T is a phylogenetic tree on A = {xi, ... ,a; 6 }, and T' = T[eg;x 7 ] is a 
phylogenetic tree on {x\, ... .X'j} that is obtained from T by attaching the leaf labelled xy to edge eg. Here 
the directions of all edges are directed away from the root r, and hence omitted for simplicity. 


Let e be an edge in a phylogenetic tree T. The tree consisting of e and all edges below e is called a subtree 
of T, and is denoted by T{e). In particular, a cherry is a subtree with two leaves, and a pitchfork is a subtree 
with three leaves. The number of cherries and pitchforks contained in T are denoted by C{T) and A(T), 
respectively. Note first that we always have 1 < C(T) < n/2 and 0 < A{T) < n/3. Moreover, in our definition 
a cherry contains three edges and a pitchfork contains five edges. As an example, for the tree T depicted in 
Fig. we have C{T) = 2 and A{T) = 1. In addition, T{e^) is a pitchfork with edge set {ei, eg, 65 , ey, es}, and 
T(ey) is a cherry with edge set {ei,e 5 ,ey}. Finally, C{T) and A{T) are respectively equal to the number of 


2-pronged nodes and 3-pronged nodes contained in T (see Rosenberg (20061 for the definitions of r-pronged 
nodes). 

Given an edge e in a phylogenetic tree T and a taxon xq ^ L{T), let T[e; xq] be the phylogenetic tree 
obtained from T by attaching a new leaf labelled with xq to the edge e. Formally, let e = {u,v} and let w 
be a vertex not contained in V{T), then r[e;a;o] has vertex set V{T) U {xo,w} and edge set {E(T) \ {e}) U 
{{u, w), {v, w), {w, xo)} (see Fig. i for an illustration of this construction). When the labelling of the new leaf 
is clear from the context, r[e;a;o] is abbreviated to T[e]. 
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The YHK and the PDA model In this subsection, we present a formal definition of the two null mod¬ 
els investigated in this paper: the proportional to distinguishable arrangements (PDA) model and the Yule- 


Harding-Kingman (YHK) model. In contrast to the splitting process used by Aldous (2001) to accommodate 
the two models, the random process used here is based on iteratively attaching leaves. 


Under the Yule-Harding model (Harding 1971 Yule 1925), a rooted phylogenetic tree on X is generated 
as follows. Beginning with the tree with two leaves, we “grow” it by repeatedly uniformly sampling a pendant 
edge e in the current tree T^ur and replace Tcur by T(.ur[s\- This process continues until a binary tree with 
n leaves is obtained. Finally, we label each of its leaves with a label sampled randomly uniformly (without 


replacement) from {xi,..., x„}. When branch lengths are ignored, the Yule-Harding model is shown by Aldous 


(1996) to be equivalent to the trees generated by the coalescent process in population genetics introduced by 


Kingman (1982), and so we call it the YHK model. The probability of generating a tree T under this model 


is denoted by Vy{T). 

Let Tn be the set of phylogenetic trees with leaf set {xi,..., x„}. It is well known that the number of trees 
contained in Tn is <p{n) := {2n — 3)!! = 1 x 3 x • • • x (2n — 3) (see e.g. Semple and Steel, 2003). Here we adopt 
the convention that (/^(l) = 1. Under the PDA model, each tree has the same probability, that is, l/ip{n), to 
be generated. Alternatively, a tree can be generated under the PDA model using a Markov process similar 
to the one used in the YHK model; the only difference is that the edge e is uniformly sampled from E{T), 
instead of E*{T). We use E^, Vy, Covy and py to denote respectively the expectation, variance, covariance 
and correlation taken with respect to the probability measure Py under the YHK model. Similarly, E„, V^, 
Covu and pu are defined with respect to the probability P„ under the PDA model. 

For n> 2, let A„ (resp. Cn) be the random variable A{T) (resp. C(T)) for a random tree T in Tn- In this 
paper, we are interested in the joint distributions and the marginal properties of An and C„ under the YHK 
and the PDA models. 


Subtree Pattern For later use, we present in this subsection several technical results concerning the change 
of the numbers of cherries and pitchforks when a new leaf is attached to a phylogenetic tree. 

We begin with the following notation. Given a phylogenetic tree T, let Ei{T) be the set of pendant edges 
that are contained in a pitchfork but not a cherry; E 2 {T) the set of edges in T that are contained in a cherry but 
not in a pitchfork; E^iT) the set of pendant edges that are contained in neither a cherry nor a pitchfork; and 
E 4 {T) = E{T)\{Ei{T)LiE 2 (T)LlE 4 {T)). For instance, for the tree T depicted in Fig.[^ we have Ei(T) = {es}, 
E 2 {T) = { 62 , 64 , 69 }, E^{T) = {cq} and E 4 {T) = { 60 , 61 , 65 , 67 , 68 , 610 }. In addition, E(T) can be decomposed 
into the disjoint union of these four sets of edges. The following lemma, whose proof is straightforward and 
hence omitted here, shows this observation holds for all phylogenetic trees, where U denotes disjoint union. 

Lemma 1 Suppose that T is a phylogenetic tree with n leaves. Then we have 

E{T) = ETT) U E2lT) U E:,{T) U E^iT). (1) 


In addition, we have \Ei{T)\ = A{T), \E 2 {T)\ = 3{C{T) - A{T)), \E 3 {T)\ = n-A{T)-2C{T), and \E 4 {T)\ = 
n-l + 3A{T)-C{T). 





















6 


Taoyang Wu, Kwok Pui Choi 


The last lemma provides a decomposition for the set of edges in a phylogenetic tree, which is useful to the 
study of the PDA model. For the YHK model, we need an analogous decomposition for E*{T), the set of the 
pendant edges in T. To this end, note first that we have Ei{T) C E*{T) and E'j,{T) C E*{T). In addition, let 
E*(T) := Ei{T) n E*(T) be the set of pendant edges in Ei(T) for i = 2,4. Then we have the following lemma, 
whose proof is straightforward and hence omitted. 

Lemma 2 Suppose that T is a phylogenetic tree with n leaves. Then we have 

E*{T) = Ei{T) U E*{T) U Es^T) U £;4(r). (2) 

In addition, we have |if 2 (T)| = 2{C{T) — AiT)) and \E^(T)\ = 2A(T). 

We end this section with the following result relating the values C{T[e]) — C{T) and A{T[e]) — A{T) to 
the choice of e. 

Proposition 1 Suppose that e is an edge in a phylogenetic tree T and T' = T[e\. Then we have 

{ A{T) ife&E3(T)VjEi{T), f C(T) if e & E2(T)VJ E^{T), 

A{T )-1 ifeeEi(T), and C(T') = < 

AiT) + l ife&E2{T)-, [c(T) + l if e & Ei(T) VJ E^iT). 

Proof Let {Fi ,... ,Ek} be the set of pitchforks contained in T, and let {Hi,... ,Hi} {I > k) be the set of 
cherries contained in T. Here we may assume that indices are chosen in the way so that Hi is contained in Fi. 

Suppose first that e = {u,v) G Ei{T). Swapping the labelling of Fi if necessary, we may assume that e 
is the pendant edge contained in the pitchfork Fi but not in the cherry Hi. Let uq be the parent of u, and 
let Ml be the child of u that is distinct from v. In addition, let w be the newly added interior vertex in T'. 
Now consider cq = (mo,m) and e' = {u,w) in T'. Then T'{eo) is not a pitchfork as Uq has four leaves as 
its descendants. On the other hand, T'{e') is a cherry of T' that is not contained in T. Therefore, we have 
A{T') = k — 1 and C(T') = Z + I, as required. 

By a similar argument, we can establish the proposition for the other three cases, i.e., e G Ei{T) for 
2 < j < 4. Since by Lemma these four cases cover all possible choices of e, the proposition follows. □ 

One useful consequence of the last proposition is the following corollary, whose proof is straightforward 
and hence omitted. 

Corollary 1 Suppose that e is an edge in a phylogenetic tree T with A(T) = a and C{T) = b. Then for the 
phylogenetic tree T' = T[e], we have 

{A{T'),C{T')) e {(a - 1, & + 1), (a + 1, h), (a, 6+1), (a, 6 )} 

according to the index i (I < i < ^) with e G Ei(T). 

3 Subtree Distributions under the YHK Model 

In this section, we study the distributions of the random variables A„ and C„ under the YHK model. Our 
starting point is the following recursion on their joint distribution. 
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Theorem 1 We have 


,{An+i = a,C„+i =b) = -FyiA^ = a,Cn = b) + 


Fy{An — a + 1 , Cn — b — 1) 


+ ° + -i > .p,(>l„ = a - l.C„ = fc) + ° = „,C„ = I, - 1) 

for n > 3 and 1 < b < n. Moreover, Fy{A^ = a,C^ = b) equals to 1 if (a, b) = (1,1), and 0 otherwise. 


Proof Fix n > 3, and let T 2 ,..., T„, T^+i be a sequence of random trees generated by the YHK process, that 
is, T 2 contains two leaves and Ti+i = Ti[ei] for a uniformly chosen pendant edge in Ti for 2 < j < n. In 
particular, we have |if*(Ti)| = i for 2 < i < n + 1. Then we have 

^y{An+l = a, C'n+l = b) = P(^(T„+i) = a, C(Tn+l) = b) 

= Y, P(^(rn+i) = a, C(T„+i) = b I A{T^) = p, C{Tn) = g)P(A(r„) = p, C'(T„) = q) 

p,q 

= Y nMTn+i) = a, C(T„+i) = b I AiT„) = p, C(Tn) = q)Fy{An =p,Cn = q), (3) 

P,Q 

where the first and second equalities follow from the law of total probability, and the definition of random 
variables An and Cn- 

Let Cn be the pendant edge in Tn chosen in the above YHK process for generating Tn+i, that is, T„+i = 
T„[e„]. Since Corollary implies that 

P(H(r„+i) = a, CiTn+i) = b I A{Tn) = p, C(T„) = q) = 0 (4) 

for (p, q) ^ {(a, 6 ), (o + 1 , 6 — 1 ), (a — 1 , b), (a, b — 1 )}, it suffices to consider the following four cases in the 
summation in ([^: case (i): p = a,q = b; case (ii): p = a + l,q = b — 1; case (iii): p = a — l,q = b] and case 
(iv): p = a,q = b — 

Firstly, Propositionimplies that case (i) occurs if and only if e„ S i? 4 (T„) n Together 

with Lemma 1^ we have 

P(H(r„+i) = a,C{Tn+i) = b I A{Tn) = a,C(T„) =b) = (5) 

Similarly, Propositionimplies that case (ii) occurs if and only if e„ S Ei{Tn) C E*{Tn) = EiiTn). Hence 
by Lemma we have 

P(H(r„+i) = a, ClTn+i) = b I A{Tn) = a + 1,C(T„) =b-l) = = ^. ( 6 ) 

\E*{Tn)\ n 

Next, Propositionimplies case (iii) occurs if and only if e„ G E 2 {Tn) C E*{Tn) = E 2 {Tn)- Hence using 
Lemma [ 1 ] we have 

P(H(T„+i) = a, C{Tn+i) = b I A{Tn) = a + 1,C(T„) = b) = ( 7 ) 

Finally, by Proposition case (iv) occurs if and only if e„ is contained in E^{Tn) n E*{Tn) = if|(T'„). 
Hence by Lemma it follows that 

P(H(T„+i) = a,C{Tn+, = b) I A{Tn) = a,C(r„) = b-l) = = ri-a-^2b + 2 ^ 

Now substituting Eq. into Eq. @ completes the proof of the theorem. □ 
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The recursion in the last theorem can be used for a dynamic approach to numerically compute the joint 
distribution of An and Cn ■ More precisely, let Mm {m > 3) be the (m+ 1) x (to + 1) matrix whose (i, j)-entry 
is ¥y{Am = i — l,Cm = j — !)■ Then M 3 contains a unique non-zero entry, which is at position (2,2) and 
has a value of 1 . Next, starting with to = 4 and assuming that Mm-i is already constructed, each entry in 
Mm can be computed using time 0 ( 1 ), and hence Mm can be constructed in time 0{rn?‘) with Mm-i given. 
In other words, Mn, which specifies the joint distribution of cherry and pitchfork under the YHK model, can 
be computed in O(n^). Note that an alternative way of computing the joint distribution of and Cn under 


the YHK model is proposed in |Disanto and Wiehe (2013), which is based on integrating and differentiating 
generating functions. 

For later use, we rewrite the recursion in Theorem in the following functional form. 


Theorem 2 Let (^ : K x M —^ M be an arbitrary function. Then, under the YHK model, we have 

2 1 

Cn+l) — T(^n: ^n)] F ~^y[^n 1; Cn F 1)] 

F ~^y [i^n ~ ■'^n) F 1, Cn)] F ~^y YC' ~ ~ 2C„) ip(An, Cn F 1)] 

for n> 2. 


Proof Consider the indicator function I(a,b) on K x M defined as 

{ 1 if a; = o and y = h, 

0 otherwise. 

We multiply the equation in Theorem by <^(a, b) and rewrite them as follows 

Ey [(^(H^_|_l, Cn-\-l)h(a,b)iAn+l, Cn+l)\ — ^ { 2Ey , Cn)L(^a,b){An, Cyj)] 

F Ey [An<p{An — 1, Cn F l)d(a,b) {An ~ 1, Cn F 1)] F 2Ey [(C^ — An)‘p{An F 1, Cn)I{a,b) {An F 1, Cn)] 

F Ey [{'IT' ~ An — 2Cn)‘p{An, Cn F l)I(^n,b) (^nj Cn F 1)]} • 

Summing over all a and b completes the proof. □ 

In the remainder of this section we study cherry and pitchfork distributions using Theorem We begin 
with a functional recursion on cherry distribution Cn- This enables us to show that the cherry distribution is 
log-concave under the YHK model, and obtain an alternative approach to computing the central moments of 
cherry distribution. 

Proposition 2 Let tp : M. ^ M. be an arbitrary function. Then we have 

Eyi;{Cn+i) = ^Ey[2Cn fj{Cn) F (n - 2C„) fj{Cn F 1)] (9) 

for n > 2. In particular, we have Ey{C 2 = 1) = 1, Py(C '2 = fc) = 0 for k ^ 1, and 

oZj* ry) _2/r 2 

Ey{Cn+l =k)= —¥y{Cn = k) + - -^¥y{Cn = k-l) ( 10 ) 


for n > 2 and 1 < k < n. 
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Proof For ■)/; : M —>■ M given in the statement of the proposition, we define y) = a function on K x K. 
Applying Theoremj^to the function ip* leads to Eq. (|^. Eq. (10 1 follows from Eq. ([^ by taking '0(a;) = hix), 
where Ikix) equals 1 ii x = k, and 0 otherwise. □ 


Using the last proposition, the mean and the variance of cherry distribution can be obtained by substituting 
'0(a;) = X and 'ip{x) = respectively, in the recursive equation Eq. (|^ in Proposition]^ 

Corollary 2 (Heard , 1992- McKenzie and Steel, 2000) We have Ey(C'„) = n/3 for n > 2 and 'Vy{Cn) = 
2n/45 for n>5. 


Recall that a sequence of numbers, {j/i,... ,ym}, is said to be positive if each number in the sequence is 
greater than zero. It is called log-concave if yk-iyu+i < y\ holds for 2 < fc < m— 1. Clearly, a positive sequence 
{yk\i<k<m is log-concave if and only if the sequence {yklyk+i\i<k<m-i is increasing. Therefore, a log-concave 
sequence is necessarily unimodal, that is, there exists an index 1 < k < m such that 


yi < 92 < ■ ■ ■ < yk > yk-ki > ■■■ >ym (il) 

holds. Finally, a non-negative integer valued random variable Y with probability mass function {pk : k > 0} 
is log-concave if {pk}k>o is a log-concave sequence. 

To show that the probability density function of C„ is log-concave, we need the following lemma. 

Lemma 3 Let zi, Z 2 , z^, Z 4 be four positive numbers with z\ > ^123 and z| > 22 ^ 4 - Then we have 

Z2ZZ > Z1Z4, and Z1Z3 -I- Z2Z4 > 2ziZ4. 

Proof Since Zi are positive for 1 < z < 4, from > Z 1 Z 3 and z| > Z 2 Z 4 if follows that 

zi “ 22 “ 23' 

Hence we have 


22^3 > Z1Z4, 

which completes the proof of the first inequality in the lemma. 


( 12 ) 


To prove the second inequality in the lemma, we consider the following two cases. 

Case 1 : zi > Z2. Together with z\ > Z4Z3, this implies Z2 > Z3, and hence Z3 > Z4 in view of zf > Z 2 Z 4 . 
Therefore, we have 

(zi - Z2)(Z3 - Z4) > 0 . 


This leads to Z 1 Z 3 -|- 22^4 > Z 1 Z 4 + Z 2 Z 3 > 2 ziZ 4 , where the last inequality follows from Eq. (12|. 

Case 2 : zi < Z 2 - If Z3 < Z4, then we have (zi — Z 2 )(z 3 — Z 4 ) > 0 , and hence Z 1 Z 3 -j- Z2Z4 > Z1Z4 + Z2Z3 > 2 ziZ 4 , 
as required. Therefore, we may assume that Z3 > Z4. This implies Z 1 Z 3 > Z 1 Z 4 and Z 2 Z 4 > Z 1 Z 4 , and hence 
Z 1 Z 3 -I- Z 2 Z 4 > 2 ziZ 4 , as required. □ 


Using the last lemma, we present the following theorem concerning the log-concavity of the cherry distri¬ 


bution under the YHK model. 
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Theorem 3 Under the YHK model, we have 

Py(C„ = kf > = fc + l)Py(C„ = k - \) (13) 

for n > 2 and 1 < k < n. 


Proof For simplicity, we put an,k ■= Pi/(Cn = k). We prove this theorem by induction; the basic case n = 3 is 


straight-forward. Now assuming that n> 3 and Eq. (131 holds for all 1 < A: < n, it suffices to show that 

^n+l,k — 0,n+l,k-lCin+l,k+l (14) 


for all 1 < fc < n. Using the recursion described in Eq. (10 1 , we have 

^n+i,k = 4fc^a^,fc + {n + 2- 2fc)^a^ -h 4/c(n + 2- 2k)an,kan,k-i 
and an+i^k-ian+i,k+i is equal to 


( 2 /c “t" 2 )( 2 fc ‘^^^n,k—l^n,k-\-l F { 2 k 2 )(?r 2 k -t- 4 :^an^k— 2 ^n,k-{-l 

+ (n - 2 k){ 2 k - 2 )an,kan,k-i + {n- 2 k){n - 2 k + 4 )a„,fca„,fc_ 2 . 


Therefore, by the inductive assumption and Lemma we have 
2 

^n+l,fc ^n+l,fc —l^ln+l,fe+l 

= 2 [fc(n - 2k) + {n + 2k)]{an,kan,k-i - an,k- 2 an,k+i) + 4:k‘^{aij,, - an,k-ian,k+i) 

F 4Lan^k—l{^n,k — l ^n,k—2) F 4(72 2 2k) {a^ j^_i_i an,k(^n,k — 2) F ^an,k—2{^n,k ^n,k+l) 

F '^ank+l{^n,k—l ^n,7c— 2 ) F 4u^/j_2 (o-n./c ^n,k+l) 

— ^{^n,k—l^n,k+l F an,k^n,k —2 ‘^^n,k — 2 (^n,k+l) F 0, 


from which Eq. (14) follows, as required. 


□ 


In the next result we compute the mean and the variance of pitchfork distribution under the YHK 
model, and calculate the covariance and correlation of and Cn- Note that the mean and the variance of 
An was also obtained by Rosenberg (2006 Theorem 4.4). Since the proof is similar to that of Corollary]^ we 
only outline the main step here. 


Proposition 3 For n>7 we have 

T1 77 2 j?7 

'Ey{An) = -, Covy{An,Cn) =and = ^' (1^) 

Proof Applying Theorem]^ to (f{x,y) = x and using Corollaryit follows that 

Ej/(An+l) = —Ey[2 A^ + An(An — 1) F 2(Cn — An){An F 1) F (u — An — 2Cn)An\ 

2 n — 3 

- 3 F —E^(A„) 

holds for n> 2. Together with Ej,(A 3 ) = 1, we have Ej,(A„) = 72/6 for 72 > 4, as required. 
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Next, applying Theorem]^ to the function ip{x,y) = xy shows that 

holds for n > 2. By Corollary and Ey(A„) = n /6 it follows that 

(it 

C0Vy{Ar,+uCn^l) =Ey{An+lCn^l) - ^ ^ 

lo 

2(5n + 2 ) (n+l )2 

= _E,(A„C„) + — + -^ 

“ 5^ , . ^ , 2 

- ^ OV y i^Aji ^ Cy flj :— 

n 15 

holds for n > 5. Solving the last recursion equation, we obtain CoVy{An,Cn) = —nj45 for n > 6 , as required. 

Now the formula on Vj,(A„) can be established by an argument similar to that for CoVy{An,Cn) by 
applying Theoremj^to the function (p{x,y) = x'^. □ 

Interestingly, the last proposition imply that the correlation coefficient between the cherry and pitchfork 
distribution under the YHK model is a negative constant for n > 7. Note that negative correlation is to be 
expected as the more cherries are found in a tree, the more likely that there are fewer pitchforks in that tree. 

Corollary 3 Under the YHK model, the correlation coejficient Py{An,Cn) between An and C„ is —-\/l4/69, 
which is independent of n for n>7. 

Proof The proposition follows directly from Corollaryand Proposition!^ □ 

4 Subtree Distributions under the PDA model 

In this section, we shall investigate the cherry and pitchfork distributions under the PDA model. Similar to 
the study on the YHK model in Section our starting point is the following recursion relating the joint 
distribution of cherries and pitchforks. 

Theorem 4 We have 

Eu(^n+i = a, Cn+i = b) = -----P„(A„ = a, C„ = 6 ) + ---P„(A„ = a + 1 , Cn = 6 — 1 ) 

2 n — 1 2 n — 1 

+ + = a - 1,C„ = 6) + »- Q-26 + 2 p^ =a,Cn = b-l) 

Zn — 1 2n — 1 

for n > 3 and 1 < b < n. Moreover, Pii(A 3 = a,C^ = b) equals to 1 if (a, 6 ) = (1, 1) and 0 otherwise. 

Proof We give a sketch of the proof as it is similar to the proof of Theorem 

The only modifications needed are the conditional probabilities in the four cases there. For case (i), 
by Proposition this case occurs if and only if e„ S E^fTn), and hence the conditional probability 
is |£' 4 (T„)|/|£ 1 (T„)| = (n + 3a — 6 — l)/(2n — 1) by Lemma Using similar arguments, for case (ii), 
the conditional probability is |i?i(r„)|/|£’(r„)| = (a + l)/(2n — 1). For case (iii), the conditional prob¬ 
ability is |£' 2 (Tji)|/|i?(Tji)| = 3(6 — a -I- l)(2n — 1). Finally, for case (iv), the conditional probability is 
\E 3 {Tn)/\E(Tn)\ = {n — a — 2b + 2)f{2n — 1). The rest of the proof proceeds as in the proof of Theorem 

E □ 
















12 


Taoyang Wu, Kwok Pui Choi 


Using an approach similar to the remark after Theorem[^ the last theorem leads to a dynamic programming 
approach to compute the joint distribution of cherry and pitchfork. In addition, we present the following result 
which will enable us to study the moments of and whose proof is similar to that of Theorem]^ and 
hence omitted. 

Theorem 5 Let : K x M —>■ R be an arbitrary function. For n > S we have 
^u^-PiAn+l: Cn+l) 

= X-+ 3An — Cn — 1) (p{An, Cn)\ + g-~ 1; Cn + 1)] 

zn — 1 zn — 1 

+ -^Eu[(C'„ — An) (p{An + 1, Cn)] + ~ An — ‘2Cn) ip{Am Cn + 1)]. 


In the remainder of this section we shall apply Theoremj^to study cherry and pitchfork distributions under 
the PDA model. To begin with, we present the following functional recursion between cherry distribution, 
which will enable us to obtain the exact formula for cherry distribution and show that cherry distribution is 
log-concave under this model. 


Proposition 4 Let ip : 


be an arbitrary function. Then for n > 2 we have 


and 


EniPiCn+i) = -^E„[(n + 2Cn - 1) i/'(C„)] + - 2C„) iP{Cn + 1)] (16) 

zn — 1 zn — 1 


P„(C„+i=fc) = ^^i^^P„(C„ = fc) + ^^-^^P„(C„ = A:-I), I<fc<n. (17) 


2n- 1 


2 n - 1 


Proof Let i/; : R —i R be an arbitrary function as in the statement of the proposition. Then (p*{x, y) = ip{y) is 
an function on R x R. Now applying Theoremj^to the function ip* leads to Eq. (16). Finally, Eq. ( [l7| follows 
from Eq. (16) by taking ip{x) = Ik{x), where Ik{x) equals 1 if a; = A:, and 0 otherwise. □ 


Note that the recursion presented in the last proposition enables us to study the moments of cherry 
distribution under the PDA model. As an example, we present below an alternative computation for the mean 
and the variance of C„. Since the techniques used to solve difference equations under this model is rather 
different from that used under the YHK model (i.e., Corollary [^, a complete proof is included here. Note that 
in the proof we will use the following well-known Faulhaber’s formulae (also known as Bernoulli’s formulae) 


concerning the sum of powers of integers (see e.g. Conway and Guy, 1996). 


n 


n{n -I- l)( 2 n -|- 1 ) 

6 


i{n -I- l)(2n -I- l)(3n^ -I- 3n — 1) 
“ 30 


E*^ = 
2=1 
n 

E*^ = 


^■{n + 1 )^ 


^'{n -I- l)^(2n^ -I- 2n — I) 


12 


Corollary 4 (Chang and Fueh^ 201(J\ Proposition 5) For n > 2 we have 


E.(C„) - 2 ( 2 ^ _ 3 ) 4 and V„(C„) - 2(2n - 3)2(2n - 5) 
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Proof We may assume that n > 3 in the remainder of the proof as the case n = 2 clearly holds. Substituting 
tp{x) = a: in the recursive equation Eq. (16 1 in Propositionleads to that 


1 


IEu(C'n+l) — Z-yEu [(n + 2Cn — l)Cn + (n — 2Cn){Cn + 1)] — 


2n — 3 




2 n-l .... V .. .j 2 n-l 2 n - 1 

holds for n > 2. Together with the initial condition £^((( 72 ) = 1, multiplying the both sides of the last difference 
equation on E„((7„) by 2n — 1 and solving it leads to 

n{n — 1 ) 


( 2 n — 3)E„(C„) — 1 + • • • + (n — 1) — 


for n > 2 , as required. 


For simplicity, let f{n) = (2n — 3)(2n — 5) and g(n) = (n — l)(n^ — 2n — 1). Then 
” ” n(n — l)(n^ — n — 4) 


g{k) = + fc + 1 ) = 


fc=i 


fc=i 


Next, applying Proposition]^ to the function ip{x) = x'^ implies that 

[ (« + 2^^ - l)Cl + (n - 2C'„)(C„ + 1)^] 

_ g(.n+l) 2n-5 ^ . 

{2n - l)(2n - 3) 2n - i 

holds for n> 2. Now multiplying /(n + 1) on both sides of the above recursion leads to 

/(n)E„(C2) - f{n - l)E„(C^i) = g{n) 
for n > 3. Since E„(C'|) = 1 = —g{2) and 5 ( 1 ) = 0, we have 

(2n - 3)(2n - 5)E„(C2) = — 

k=l 

for n > 3, from which we have 

n(n — l)(n^ — n — 4) 


^uiC^n) = ■ 


4(2n-3)(2n-5) 


and hence Vy(C„) follows. 


(18) 

□ 


Another consequence of the recursion in Proposition]^ is the following exact formula on cherry distribution 
for the PDA model, whose proof is a straightforward application of induction and hence omitted here. 


Theorem 6 For n > 2 and 1 < k < n/2 we have 


.(a = k) = 


n—2k 


n\{n — l)!(n — 2)!2 

2 ^)!( 2 n - 2 j!fc!(fc”l)T ' 


(19) 


Interestingly, a similar formula for unrooted trees was obtained by Hendy and Penny (19821, that is, the 


probability that a random tree generated by the PDA model contains exactly k cherries is 

n!(n-2)!(n-4)!2"-2'= 


( 20 ) 


(n-2A:)!(2n-4)!fc!(fc-2)! 

for 2 < fc < n/2 (see, also McKenzie and Steel 2000, Theorem 4). A direct consequence of Theorem]^ is that 
the cherry distribution under the PDA model is log-concave, and hence also unimodal. 
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Theorem 7 For n >2 and 1 < k < n we have 

P«(C'„ = kf > P„(C„ = k + 1)P„(C„ = k-l). (21) 

Moreover, let A{n) = ^” 2 ( 2 ^+!)^^ • 

^^(C'n = A: — 1) < Pti(C'„ = fc) /or 1 < A: < A{n), and Pu(C'„ = k) > P„(C'„ = A: + 1) /or Z\(n) <k< n/2. 


Proof Since Pu(C'„ = A:) = 0 for A; > n/2, the theorem clearly holds for A: > n/2 — 1. Hence in the remainder 
of the proof we may assume k < {n — 2)/2. Now by Theorem]^ we have 

P«(C'n = A:-l) 4A:(fc-l) 


i{Cn = k) {n — 2k + l)(n — 2A: + 2) 


:= g{k,n). 


( 22 ) 


Considering the function g{k,n) defined in Eq. (22), then gik + l,n) > g{k,n) holds for 1 < A; < (n — 2)/2. 


This, together with Eq. (22), completes the proof of Eq. (21). 

The second part of the theorem follows from the observation that g{k,n) > 1 if and only if A: > A{n). □ 

Now we apply Theorem to study pitchfork distribution, and the joint distribution between pitchforks 
and cherries under the PDA model. Note that the mean and the variance of pitchfork distributions under 


this model were also derived by Chang and Fuchs (2010 Proposition 5). Since the proof is similar to that in 
Corollary]^ we only outline the main steps used here. 

Proposition 5 For n> 3 we have 

E„(A„) = 

C OV , C — 

V„(A„) = 


n(n — l)(n — 2) n 
2(2n-3)(2n-5) ^ 8’ 

—n(n — l)(n — 2)(n — 3) 


n 

2{2n - 3)2(2n - 5)(2n - 7) "" ~32’ 

3n(u — l)(n — 2)(n — 3)(4n^ — 40n^ + 123n — 110) 3n 


4(2n - 3)2(2n - 5)2(2n - 7)(2n - 9) 

Proof Applying Theorem]^ to the function (p{x,y) = x and using Corollarywe have 


(23) 

(24) 

(25) 


E^i(A^_i_i) — 


3n(n — 1) 


2n- 5 


E„(A„) 


2(2n- l)(2n-3) 2n - 1 

for n> 2. Now Eq. (23) follows by solving the last recursion with an approach similar to that in Corollary]^ 


To this end, applying Theorem]^ to the function ip{x,y) = xy implies that 




2n - 1 

holds for n> 2. Solving this recursion we have 

7 

Eu(A„C„) = - 


4(2n- l)(2n- 3)(2n- 5) 


i{n — l)(n — 2)(n^ — 3n — 2) 
4(2n-3)(2n-5)(2n-7) ’ 


from which Eq. (24) follows. 


Finally, applying Theorem]^ to the function ip(x,y) = x^ shows that 

2 ^ ,^ 2 ^ , 9{n+l) 

2n-l 4(2n-l)(2n-3)(2n-5)(2n-7) 


(26) 
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holds for n > 2. Solving the above recursion leads to 

n{n — l)(n — 2){n^ — 4n^ — 17n + 66) 


= 


4(2n - 3) (2n - 5) (2n - 7) (2n - 9) ’ 


from which Eq. (25) follows. 


□ 


We end this section with the following correlation result for the PDA model. 
Corollary 5 For n> A we have 

Pui^Afij Cn'j — 


2(2n- 5)(2n- 9) 


3(2n-7)(4n3-40n2 + 123n-110) ^n' 

In addition, {|pu(A„, C „)|}„>4 is a decreasing sequence converging to 0. 


(27) 


Proof Note first that Eq. (27) follows from Corollary]^ and Proposition]^ Since the sequence {|p„(A„, C „)|}„>4 
clearly approaches 0, it remains to show that this sequence is decreasing. To this end, it suffices to show that 
the ratio 

Pu {An , Cn') 


R{n) = 


Pu{An-\-l, Cyi-|_i)2 


is greater than 1 for n > 4. Using Eq. (27), we have 

(2n - 5)2(2n - 9)(4(n + l)^ - 40(n + 1)^ + 123(n + 1) - 110 
~ (2n - 7)2(2n - 3)(4n3 - 40n2 + 123n - 110) ' 

By numerical computation, we can check that R{n) > 1 for 4 < n < 15, therefore we may assume that n > 15 
in the remainder of the proof. Now denoting the numerator and denominator of R{n) by Ri{n) and i? 2 (n), 
respectively, then we have 

Ri{n) - i? 2 (n) = 64n® - M4n* + 5408n^ - 1504871^ + 20436n - 10995 
> 64n'‘(n - 15) + 540871^71 - 15) + 20436(77 - 15) > 0 


for 77 > 15. This implies R{n) = Ri{n)/R 2 {n) > 1 for 77 > 15, as required. 


□ 


5 A comparative study of two models 

In this section, we compare and contrast the distributional properties of the number of cherries and the 
pitchforks in random trees generated under the YHK and the PDA models. 

To begin with, note that the recursions in Theorems and provide us exact computation of the joint 
distribution, and hence also the marginal distributions, of A„ and C„ under the two models. For example, the 
joint distributions with n = 200 for the two models are depicted in Fig.[^ which suggests the joint distributions 
under both models can be well-approximated by bivariate normal distributions. In addition, when 77 = 200, the 
maximum for the joint distribution under the YHK model is 0.177..., which is attained at (A„, Cn) = (33,67). 
On the other hand, under the PDA model the maximum is 0.145..., which is attained at (A„, Cn) = (25, 50). 
On average, the result below shows that trees generated by the YHK model contain more cherries and more 
pitchforks than trees of the same size generated by the PDA model. 
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Proposition 6 For n > 3, we have 


and 


E„(C„) <E^(C„) < -E„(C„) 


E„(A„) < EyiAn) < -Ei,(A„). 


Proof By Corollaries and we have 


E,;(C'„) = 


1 + 


n — 3 


Eu(C'„), 


3(n — 1) 

from which Eq. (28) follows. Similarly, by Propositions and we have 

— 7n + 9 


Ey{An) = 


1 + 


3(n — l)(n — 2) 


E.u{An), 


from which Eq. (29) follows. 


Next, we study the variances of cherry and pitchfork distributions under the two models. 
Proposition 7 For n > 5, we have 

32 49 

^V,(C'„)<V„(C„)<-V„(C„). 

Proof Let = 'Vy{Cn)/'Vu{Cn)- Then by Corollariesandwe have 

4(2n- 3)2(2n- 5) 


This implies 


45(n-l)(n-2)(n-3)' 

Rn n(2n — 3)(2n — 5) 2n + 3 

~ j~ 1 ■ 


+ 


Rn+i (n-3)(2n-l)2 (n-3)(2n-l)2 

and hence that is decreasing in n. Noting that lim„_>oo Rn = f§ i we have 

32 o D 49 ^ 

45 < < -^5 - 54 < 1. 


> 1 , 


(28) 


(29) 


□ 


from which the proposition follows. 


Proposition 8 For n> 7, we have 

1.168 V„(A„) < —V„(A„) < Vj,(A„) < ^V„(A„) < 1.281 V„(A„). 

Proof Let i?„ = Yy{An)/Yu{An) for n > 7. Then by Propositionandwe have 

23(2n - 3)2(2n - 5)^(2n - 7)(2n - 9) 


and hence 


Rn = 


Rn 


315(n - l)(n - 2)(n - 3)(4n3 - 40n2 + 123n - 110) ’ 
n(2n - 5)(2n - 9)(4n3 _ 28n^ + 55n - 23) 


Rn+i {n - 3)(2n - l)2(4n3 - 40^2 + 123n - 110) 
2(24n3 - 18071^ + 382n - 165) 


= 1 + 


(n - 3)(2n - l)2(4n3 - 40n2 + 123n - 110) 
Therefore, is strictly decreasing in n and we have 

23 • 4 • 4 • 4 368 

= ^ hm W... = 

v,(A„) 


> 1 . 


Vy(^n) ^ ^ ~ ' = ^ > 1.168 . 


315-4 


315 


This, together with R’j = < 1.281, completes the proof. 


□ 


□ 
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Proposition shows that trees generated by the YHK model have smaller variation in the number of 
cherries than trees of the same size generated by the PDA model. On the contrary, Proposition shows that 
YHK model generates trees with larger variation in the number of pitchforks than the PDA model does. This 
is not unexpected as the covariances of cherries and pitchforks are found to be negative by Propositions 
andlH 

Now we present a result concerning the correlation coefficients between the cherry and the pitchfork 
distributions under the two models. Since these two distributions are negatively correlated, we will focus on 
their absolute values. 

Proposition 9 For n > 7, we have \py{An,Cn)\ > \pu{An,Cn)\- Moreover, { }ri .>7 ® monotoni- 

cally decreasing sequence with limit 0. 

Proof This follows from Corollary and and the observation that 

I 30 

|p„(A7,C7)| = Y ^ ^ < |pj,(A7,C7)| = 

□ 




Fig. 3: Plot of the ratio Py(C'„ = fc)/P„(C'„ = k) for n = 200 and 1 < fc < 100. Here the y-axis is in the 
logarithmic scale; the dotted line y = 0 indicates the ratio being 1. 


Proposition 1^ states that the mean of C„ is greater under the YHK model than the PDA model. In the 
remainder of this section, we shall present a more detailed study on Cn- Intuitively, it is easy to see that the 
number of cherries contained in a random tree generated by the YHK model is likely to be greater than that 
by the PDA model: firstly, by Proposition we know that the number of cherries in T[e], the phylogenetic 
tree obtained from T by attaching a new leaf to edge e in T, is strictly greater than that in T precisely 










18 


Taoyang Wu, Kwok Pui Choi 


when e is a pendant edge of T; secondly, in the YHK process the edge to which the new leaf is attached is 
sampled only from the pendant edges while in the PDA model that edge is sampled from all possible edges. 
Indeed, this intuition can also be corroborated by numerical results. As an example, considering the ratio of 
^y{Cn = k)/f'u(,Cn = k) with 71 = 200 as depicted in Fig. [fusing a logarithmic scale, then it is clear that the 
ratio is strictly increasing and is greater than 1 when k is greater than a certain value. The following theorem 
establishes the existence of a unique change point between the two models for n > 4. Note that a similar 


phenomenon is shown to hold for clade sizes by Zhu et al (2015 Theorem 5). 


Theorem 8 Suppose n > 3. The ratio Pj,(C'„ = fc)/P„(C'„ = k) is strictly inereasing for 1 < k < n/2. In 
particular, there exists a number Tn with 1 < Tn ^ nl2 such that 

Vy{Cn = k) < ¥u{Cn = k) for 1 < k < Tn, and Py(C'„ = k) > P„(C„ = k) for r„ < fc < n/2. 


Proof For simplicity, put a„ = Py(C'„ = k). By Eq. (22), it suffices to show that 


fik,n) =: 


k — l 
nk - 


Vu{Cn =k-l) 


4:k{k — 1) 


:= g(k,n) 


(30) 


'u{Gn = k) (n — 2fc + l)(n — 2fc + 2) 
holds for 1 < A: < n/2. To this end, we shall use induction on n. The base case n = 3 is clear because 

^y{Gz = 1) = Pm(C 3 = 1) = 1. For induction step, assuming that f{k,m) < g{k,m) holds for a given m > 3 

and all 1 < fc < m/2, it remains to show that /(fc,m + 1) < g{k,m + 1) for all 1 < fc < (m + l)/2. 

Note first that /(I, m + 1) = 5 ( 1 , m + 1) = 0. Now fix 2 < fc < (m + l)/2, then > 0. Since Proposition]^ 

implies 

fc-i 2 fc — 2 m — 2fc + 4 i ,_2 fc 2fc fc m — 2k2 


®m+l “ 


m 


m 


and , 1 = — 
m 


m 


it follows that Eq. (30) is equivalent to 

(2fc — 2)n^ ^ + (m — 2k -\- 4)a^ ^ < (2ka!^ (m — 2k + 2)u^ ^ T 1)* 

Since > 0, dividing both sides of the last inequality by leads to 

(2fc — 2) + (m — 2fc + 4)/(fc — 1, m) < _ 2fc + 2)g{k, m + 1). 

f[k,m) 

By induction assumption we have /(fc — l,m) < g{k — l,m) and 0 < /(fc,m) < g(k,m), hence it remains to 
show that 


(31) 


(2fc — 2) + (m — 2fc + 'T)g{k — 1, m) < 2fc ^^^’ ^~*~ ^ + (m — 2fc + 2)g{k, m + 1). 


(32) 


g{k,m) 

To this end, denoting the left and the right side of Inequality ( [3^ by Llk,m) and R{k,m), respectively, then 
we need to show that R{k,m) — L(k,m) > 0. Note first that 

4(fc-l)(fc-2) 


On the other hand, since 


we have 


L{k, m) = {2k — 2) + 

g{k,m + l) m — 2k + 1 
g{k,m) m — 2fc + 3 

4/r 

R{k, m) = 2k — 


m — 2fc + 3 


= 1 - 


m — 2fc + 3 ’ 
4fc(fc — 1) 


m — 2fc + 3 m — 2fc + 3 






















On joint subtree distributions under two evolutionary models 


19 


Therefore, we have 


R(k, m) — L{k, m) = 2 + 


= 2 


4fc(fc — 1) — 4fc — 4(fc — 1)(A: — 2) 


4(fc - 2) 


m — 2k + 3 
> 0 , 


m — 2k + 3 

as required. Here the last inequality follows from m> 3 and 2 < fc < (m + l)/2. 


□ 


6 Discussion and Conclusion 


Tree shape indices are summary statistics of some aspect of the shape of a phylogenetic tree, particularly the 


‘balance’ of a tree. Since the introduction of the first tree shape index by Sackin (1972|, many such indices 


have been proposed (see Mooers and Heard (1997) for an excellent review and Mir et al (2013) for some recent 
development). 

In this paper we present several results concerning the distributions of cherries and pitchforks under the 
YHK and PDA models. Our main results include two novel recursive formulae on the joint distributions of 
cherries and pitchforks under these two models, which enable us to numerically compute their joint probability 
density functions (and hence also the marginal distributions) for trees of any size numerically. This is relevant 
because one of the main applications of tree indices is their use as test statistics to discriminate stochastic 


models of evolution (see e.g. Blum and Frangois, 2006 and the references therein), and we will pursue this 
application more thoroughly elsewhere. 



Fig. 4: Plot of the total variation distance between the joint distributions of cherry and pitchfork and discretised 
bivariate normal distributions for 10 < n < 200 under the YHK model (solid line) and PDA model (dotted 
line). 

Our numerical results (e.g. Fig. indicate that the limiting joint distributions of cherries and pitchforks 
can be well approximated by bivariate normal distributions. For the YHK model, this was recently confirmed 
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by Holmgren and Janson (2015) and it remains open to establish the analogous result for the PDA model. In 


addition, it would be interesting to study the rate of the convergence. 

In this paper we concentrate on rooted trees, but it is of interest to investigate to what extent the results 


obtained in this paper for rooted trees can be carried over to unrooted trees. For example, using Eq. (20) 


and an argument similar to the proof in Theorem it follows that the cherry distribution of unrooted trees 
under the PDA model is also log-concave, and hence unimodal. However, whether the same property holds for 
the cherry distribution of unrooted trees under the YHK model remains open. One challenge is to derive the 
recursions for unrooted trees as in Theorem [T] or an exact formula as in Theorem 0 

To end this article, we mention several additional questions. For instance, is pitchfork distribution, or other 
subtree distribution, log-concave? Our numerical calculation suggests the pitchfork distribution is log-concave. 
A related question is whether there also exists a unique change point for other subtree distribution. Finally, 
cherry pattern and pitchfork pattern are closely related to instances of recursive shape index (in the sense 


of Matsen (2007)), therefore it would also be of interest to see whether some of the properties obtained here 


can be carried over to some other tree indices as well. 
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